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Abstract: We have developed a method for estimating protein-ligand binding free energy 
(AG) based on the direct protein-ligand interaction obtained by a molecular dynamics 
simulation. Using this method, we estimated the AG value statistically by the average 
values of the van der Waals and electrostatic interactions between each amino acid of the 
target protein and the ligand molecule. In addition, we introduced fluctuations in the 
accessible surface area (ASA) and dihedral angles of the protein-ligand complex system as 
the entropy terms of the AG estimation. The present method included the fluctuation term 
of structural change of the protein and the effective dielectric constant. We applied this 
method to 34 protein-ligand complex structures. As a result, the correlation coefficient 
between the experimental and calculated AG values was 0.81, and the average error of AG 
was 1.2 kcal/mol with the use of the fixed parameters. These results were obtained from a 
2 nsec molecular dynamics simulation. 

Keywords: protein-ligand docking; molecular dynamics simulation; protein-ligand binding 
free energy 



Pharmaceuticals 2012, 5 



1065 



1. Introduction 

The protein-ligand binding free energy (AG) has been calculated by various computational methods. 
Many protein-ligand docking programs have been developed to estimate AG [1-7], but the existing 
docking software is relatively inaccurate [1,2]. There is an almost 50% success rate of reproducing a 
protein-ligand complex structure within a root mean square deviation (RMSD) of <2 A [6,7]and the 
accuracy of AG estimation remains at approximately 2-3 kcal/mol [6-10]. 

There have been several reports on protein-compound docking and free energy calculation by 
molecular dynamics (MD) simulation. Even if a protein-ligand complex structure is unknown, ab initio 
MD docking simulations show protein-ligand complex structures and free energy landscapes [11-14]. 
Generalized ensemble methods have been adopted for wide conformational searches [15-18]. 

In an explicit water model, if a protein-ligand complex structure is known, the binding free energy 
and the potential of mean force (PMF) along the dissociation path can be obtained by using the filling 
potential (FP) method [18], the meta-dynamics method [19,20], the smooth reaction path generation 
(SRPG) method [21], or Jarzynski's method [22]. We previously proposed the FP and SRPG methods 
[18,21], each of which generates a reaction path (dissociation path) of the ligand and calculates the free 
energy surface along the path based on ab initio MD simulation. The other trend is the application of 
Jarzynski's equation [22]. In this method, a harmonic potential that restrains the ligand at a particular 
position moves slowly and leads the ligand from the binding state to the dissociation state, and the free 
energy profile is calculated. Among these methods, MP-CAFFE has been applied to various species, 
and the AG estimation error was almost 1 kcal/mol [23]. 

The molecular-mechanics Poisson-Boltzmann surface-area (MMPBSA) method [24] and the linear 
interaction energy (LIE) method [25] have successfully been used to reproduce the trend of AGs for a 
single target protein. These methods are much faster than the ab initio MD methods described above. 
In the LIE method, AG is evaluated based on the average van der Waals (vdW) energy and the average 
electrostatic energy. The weight parameters of the vdW and electrostatic terms are optimized for each 
target. To apply the LIE method, multiple active compounds and their docking poses are necessary in 
order to optimize the parameters for each target protein. 

The COMBINE method is based on the assumption that biological activities can be correlated with 
a linear combination of a subset of the van der Waals and electrostatic terms of the interaction energies 
between a ligand and its surrounding protein residues (such as the target receptor) [26,27]. The 
protein-ligand binding free energy AG is given by: 



where Er w and Er are the van der Waals and electrostatic terms of the interaction energies, 
respectively, between the ligand and the i-th residue of a protein (the target protein) and c is a constant. 
w, vdw and wf le are parameters to be determined to reproduce the experimental data. 

The coefficients of Equation (1) (w, vdw and w, eie ) could be determined by partial least squares (PLS) 
analysis. As is the case with the LIE, to apply the COMBINE method, multiple active compounds and 
their docking poses are necessary in order to optimize the parameters for each target protein. It has 
been shown that the COMBINE analysis predicts binding free energies with good accuracy and also 
identifies important amino acid residues for the improvement of affinity [26,28-32]. 




(1) 
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In the present study, we propose a AG estimation method based on the direct protein-ligand 
interaction obtained by molecular dynamics simulation. We introduced the entropy term and the local 
effective dielectric constant, and modified the van der Waals potential to improve the accuracy of the 
present method so that it does not require multiple active compounds to predict the AG value. 

2. Results and Discussion 

2.1. Theoretical Background 

AG is calculated by Zwanzig equation as follows [33]: 

AG = -k B T * In < exp(-A£/ l(k B T)) > h 

AU = U h -U u (2) 

where Ub, U u , and < >b represent the potential of the protein-ligand bound and unbound states and the 
average over the bound-state trajectory, respectively. 

Kubo's cumulant expansion gives the following equation excluding the log and exp functions as [34]: 

AG =< AU > 
1 



2k B T 



< (AU- < au >y > 



+ l —— < (AU- < AU >) 3 > ( 3 ) 

6(k B T) 2 

1 <(AU-<AU>) 4 >+-- 



24(k B T) 



The first term (linear term) corresponds to the enthalpy, and the higher-order term corresponds to the 
entropy. The second term becomes: 

\2 ^ , i r,2 Tirr , irr. , , ir7.2. 



< (AU-< AU >) >=<AU l -2AU <AU > + <AU > z > 
=< (U b -UJ 2 - 2(U b -UJ<U b -U u > + <U b -U u > 2 > 



=< Ui - 2U h U u + U z u - 2U h <U h > +2U h < U u > +2U U <U b > -2U U <U U > + 
<U b > 2 -2<U b ><U U > + <U U > 2 > 



(4) 



When we assume: 



< U h U u >=< U b >< U u > 



(5) 



Then, the second term of Equation (3) becomes: 

=< (U b -< U b >) 2 > + < (U u -< U u >) 2 > (6) 

And Equation (3) becomes: 



AG=<U b > b -<U u >„ 

- {< (U b - < U„ > b ) 2 > h + < (U u - < U u > u f >„ } 
2k B T 



(7) 



1 

6(VT 



+ TTT^T {< (U b -<U h > b f > b -<(U U -<U U >„) 3 >„} 
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The linear term (the first and second terms; <Ub>b ~ <U U > U ) of Equation (7) corresponds to the LIE 
approximation, when the energy difference is due to the receptor-ligand and solvent-ligand interaction 
energies. The LIE approximation calculates AG by: 

AG = a{< E vdW R - L > RL - < E vdw s- L > SL ) + J3(< E e \- L > RL - < E de s-l > sl ) ( 8 ) 

where E vdw x and E ele x are the protein-ligand van der Waals interaction energy and the electrostatic 
interaction energy between the ligand and the surrounding molecules, R-L/S-L represents the 
interaction of the protein-ligand complex system/solute-ligand system, and the brackets (< >x) 
represent the average over simulation of the protein-ligand complex system (RL) or the solute-ligand 
system (SL). The LIE equation includes two parameters: a and p. These parameters are known to 
reproduce the experimental data for each target protein. 

The COMBINE approximation calculates the AG value by: 

AG = ^w vdw (i) < E vdw (i) R _ L > RL +^w ele (i) < E°\i) R _ L > RL 
i i 

where the E vdw (i) and E ele (i) are the protein-ligand van der Waals interaction energy and the 
electrostatic interaction energy between the i-th residue of the protein and the ligand, and w is the 
parameter. The simulation of the ligand-solution system is not necessary. 

Both the LIE approximation without solvent and the COMBINE approximation with the 
residue-independent w parameter gave the same equation: 

^simple = CC < E VdW R-L > RL +/3 < E de R-L > K (1Q) 



2.2. Entropy Term 

In the present study, we introduced the entropy term in Equation (10) as follows. We call this 
method the direct interaction approximation without solvent (DIAV) method: 

AG D1AV = «X< EVdW (0 > x e- a2 * Svdw(i) + ^(0 > xe^ 2xSe ' e(,) +zxS x (n) 

i i 

where E vdw (i) and E ele (i) are the vdW and electrostatic interactions between the i-th residue of the 
protein and the ligand, respectively. Svdw(i) and Sele(i) are fluctuations of E vdw (i) and E ele (i) during 
the molecular dynamics simulation, respectively. The x*S x term represents the energy fluctuation of 
the system corresponding to the second-order term of Equation 7 ((<Ub - <Ub>b) 2 ). 

In Equation (11), the t*S x term is the fluctuation of energy, but we found that the energy fluctuation 
itself is not suitable for evaluating AG. Instead of the energy, S x is the fluctuation of a property x that is 
related to the energy. The properties x in the current study are the accessible surface area (x = ASA), 
the dihedral angles (x = DIH), the vdW potential (x = vdW), and the electrostatic potential (x = ELE) 
of the protein-ligand complex structure. In the present study, we determine which property is best for 
estimating AG. There are five parameters: a, a2, P, P2, and t. 
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2.3. Modification of van der Waals Potential Term 

To represent the van der Waals (vdW) interaction, a Lennard- Jones (LJ) 12-6-type function is used. 
In the docking score, the vdW interaction (lipophilic atom contact) term represents both the vdW 
interaction and the cavity formation energy in solvent; in water, the latter is 10 times greater than the 
vdW interaction. This function gives very large values when atomic conflicts occur. To reduce these 
conflicts, an LJ 9-6-type function has been used in a protein-ligand docking study [3]. In general, the 
absolute value of the vdW interaction is much smaller than the AG value. The LJ 12-6 value represents 
the atomic contact and its hydrophobic interaction. Thus, in the present study, we apply LJ 12-6, 
LJ 9-6, LJ 8-4, and LJ 6-3-type functions as follows: 

Euu- 6 =^[(^) 12 - A 6 ], welldepth=s, R e = 6 JlR 0 (12) 
K K 

E U9 - 6 = ^r^Y y " ( ^ )6] ' wel1 depth= s ' Re = \f* 0 ( 13 > 



Eus-4 = 44 A 8 -(^) 4 L well depth= s, R e = i/2R 0 
R R 



(14) 



= 44 A 6 " A 3 ] , well depth= s, R e = UlR 0 (15) 
K K 

where R e is the equilibrium distance. The R e and the well depth values are set to the same values 
obtained from AMBER param99 [35] and the general AMBER force field (GAFF) [36]. 

The data-sampling MD simulation is performed with the conventional AMBER force field (LJ 12-6 
potential), and the analysis is performed using Equations (12)— (15). 

2.4. Effective Dielectric Constant 

In the ligand-binding pocket, the effective dielectric constant (s e ff) should be different at each point, 
since the s e ff values of proteins are 2-4 and the s e ff of water is 78.5. The E ele (i) should be scaled by the 
s e ff. We introduced the modification of the electrostatic interaction as follows (we call this method the 
direct interaction approximation with solvent (DIAS) method): 



ACW = «X< EVdW (0 > >< ^ 2xSvdn,) + /£< E mo / e (0 > xe~" 2XSem + rx S 



(16) 



where E mo / le (i) is the E ele (i) value scaled by the s e ff. The s e ff value could be calculated from the ratio 
between the electrostatic force calculated in the explicit water model and that in vacuum, as follows: 

EtM = jrll Efif) (17) 

b eff j 

where Ej ele (i) is the electrostatic interaction between the i-th residue and the j-th atom of the ligand in 
vacuum. The following scale factor might be a candidate: 

1 (F. rea/ • F" al ) 

E eff (A 'A ) 
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or: 

1 



where Fi and F™ are the electrostatic force acting on the i-th atom of the protein considering the 
solvent and not considering the solvent in the explicit water model, respectively. The F real and F vac 
were calculated by the molecular dynamics simulation in the explicit water model and in vacuum, 
respectively. 

The scale factor s' e ff by Equation (18) or (19) could be unrealistically large when the denominators of 
Equations (12) and (13) are nearly zero. Thus, we introduce a parameter x and the scale function as follows: 

s l eff =l + exp(-xx^ # ) (20) 

The value of s e ff in Equation (20) is 1 < s e ff, while the actual s e ff value could be less than 1 . But we 
introduced parameter P in Equation (16), thus the actual s e ff parameter is s e ff/p. In the following 
analysis, the factor s e ff in Equation (20) is used as the actual scale factor. 

2.5. Examination of Entropy Term 

We applied the DIAV method (Equation (11)) to the protein-ligand complex structures to examine 
the entropy property term Sx and performed the leave-one-out cross-validation test, as summarized in 
Table 1, which also summarizes the optimized parameters. 



Table 1. Cross-validation results obtained by Equation (10) and the DIAV method (Equation (11)). 



Statistics 


A(j s i m pie 

(Equation 10) 


ELE a 


vdW a 


ASA a 


DIH a 


Average error (kcal/mol) 


2.22 


2.30 


1.85 


2.06 


1.94 


R 


0.72 


0.70 


0.72 


0.71 


0.67 


a 


0.22 


0.22 


0.18 


0.17 


0.15 


P 


0.017001 


0.016411 


0.005958 


0.012600 


0.010430 


x*100 




-0.078460 


-28.506610 


-0.026605 


-0.000696 



The vdW potential is the LJ 12-6 type, a: property (x) of Equation (11). Here a2 = P2 = 0. 
The energies are presented in kcal/mol, and R represents the correlation coefficient. 



In the leave-one-out cross-validation test, one data is selected as the test data that is to be predicted 
and the other data are used as the teaching data to generate the prediction model equation. The test data 
is selected one after another in the given data set until all data are selected as the test data. The vdW 
energy term was set to an LJ 12-6 function, and the dielectric constant was set to 1. The values of the 
parameters a2 and P2 were set to zero. The ASA parameters (atomic solvation parameter and radius of 
each atom) were obtained from a previous study [37]. In the present study, the parameters of Equation (11) 
were optimized by the least-squares deviation error of the AG values. Compared to the results obtained 
by the simplified version of the COMBINE method (Equation (10)), the DIAV method (Equation (11)) 
slightly improved the accuracy of AG. 



^jjreal real ^ 



(19) 
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2. 6. Examination of vdW Term 

We examined the DIAV method with the vdW term using Equations (12)— (15). The results and the 
optimized parameters are summarized in Table 2. The s value was set to 1. The entropy property x was 
set to the ASA. We also examined the case of x = DIH, and the result was quite similar to that obtained 
in the case of x = ASA. The LJ 8-4-type function gave the best result among the four functions (LJ 12- 
6, LJ 9-6, LJ 8-4, and LJ 6-3) in the leave-one-out cross-validation test, while the accuracy obtained 
was similar among the functions. Thus, the LJ6-3 and LJ4-2-type functions were not used in the 
following study; instead we focused on the LJ8-4 function. 

Table 2. Cross-validation results obtained by the DIAV method (Equation (11)) to 
examine the van der Waals potential type. 



Statistics 


LJ9-6 


LJ8-4 


LJ6-3 


Average error 


2.26 


1.75 


1.89 


(kcal/mol) 


R 


0.69 


0.76 


0.71 


a 


0.1727 


0.0428 


0.0066 


P 


0.0139 


0.0072 


0.0078 


t*10000 


-2.9273 


-2.5677 


-2.8531 



The energies are presented in kcal/mol, and R represents the correlation coefficient. 

The vdW parameters represent both the protein-ligand vdW interaction and the hydrophobic 
interaction. In the present study, however, the number of data were limited to the optimization of the 
parameters, and then we used just the original vdW parameters. 

2. 7. Examination of a2 and (52 Parameters 

We examined the parameters a2 and (32 of the DIAV method (Equation (11)). The vdW potential 
was set to the LJ 8-4-type function, and the dielectric constant was set to 1 . The entropy property x was 
set to the ASA. We also examined the case of x = DIH; the result was quite similar to that obtained in 
the case of x = ASA. The leave-one-out cross-validation results and the optimized parameters are 
summarized in Table 3. The optimized a2 and P2 were about 0.01 and -0.0013, respectively, and the 
modulated vdW and electrostatic energy values were close to the original (intact) values. Actually, the 
parameters a2 and P2 improved the AG estimation accuracy, and the equation includes five parameters 
(a, P, x, a2, and P2). The two additional parameters (a2 and P2) slightly improved the average 
accuracy. 

2.8. Examination of Effective Dielectric Constant Term 

We applied the idea of the effective dielectric constant. We applied the DIAS method (Equation (16)) to 
the estimation of AG using the s e ff defined by Equations (18) and (19). The leave-one-out cross- 
validation results and the optimized parameters are summarized in Table 4. 
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Table 3. Cross-validation results obtained by the DIAV method (Equation (11)) to 
examine al and P2 parameters. 



Statistics 


ASA 


DIH 


Average error 


1.63 


1.59 


(kcal/mol) 


R 


0.80 


0.76 


a 


0.04146 


0.03832 


P 


0.00643 


0.00491 


x* 10000 


-2.74887 


-0.06949 


a2 


0.0093 


0.0093 


P2 


-0.0013 


-0.0015 



The vdW potential is the LJ 8-4 type. The energies are presented in kcal/mol, and R represents the 
correlation coefficient. 



Table 4. Cross-validation results obtained by Equation (10), the DIAV (Equation (11)), 
and the DIAS (Equation (16)) methods. 





AGexptl 

(kcal/mol) 


*-» XJ simple 


l\ \ «r»i * \t 
v 1 1) 1 \ \ 






\ £A| lid iiuii y±\j)) 


( Frin it inn (\ \ X\ 






(kcal/mol) 


(kcal/mol) 


(kcal/mol) 


labe 


-9.57 


-5.46 


-6.27 


-6.68 


labf 


-7.39 


-6.30 


-6.67 


-6.90 


lapu 


-10.50 


-13.50 


-11.98 


-11.76 


ldbb 


-12.27 


-8.75 


-11.79 


-11.69 


Idbj 


-10.47 


-8.35 


-12.27 


-12.10 


Idog 


-5.48 


-5.40 


-6.09 


-6.12 


ldwb 


-3.98 


-3.69 


-4.83 


-5.05 


lepo 


-10.85 


-17.25 


-14.82 


-15.56 


letr 


-10.09 


-9.91 


-10.35 


-10.08 


lets 


-11.62 


-11.05 


-11.82 


-11.52 


lett 


-8.44 


-9.46 


-9.99 


-9.75 


lhpv 


-12.57 


-14.02 


-12.88 


-12.78 


lhsl 


-9.96 


-6.53 


-6.74 


-7.18 


lhtf 


-11.04 


-12.45 


-11.12 


-11.00 


lhvr 


-12.97 


-16.98 


-14.67 


-14.95 


lnsd 


-7.23 


-7.44 


-8.33 


-8.13 


IpgP 


-7.77 


-11.01 


-11.09 


-10.24 


lphg 


-11.81 


-6.88 


-8.03 


-8.22 


lppc 


-8.80 


-9.83 


-8.66 


-8.85 


lpph 


-8.49 


-8.50 


-7.87 


-8.00 


lrbp 


-9.17 


-9.29 


-8.58 


-8.91 


ltng 


-4.00 


-4.15 


-4.64 


-4.90 


ltnh 


-4.59 


-3.54 


-4.24 


-4.61 


lulb 


-7.23 


-3.82 


-5.71 


-5.74 
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Table 4. Cont. 





AGexptl 
(kcal/mol) 


AGsimple 


AGDIAV 


AGDIAS 


PDB ID 


(Equation (10)) 


(Equation (11)) 


(Equation (1( 




(kcal/mol) 


(kcal/mol) 


(kcal/mol) 


2cgr 


n no 

—9.92 


O" no" 

— /.0 / 


1 n c\a 

— iu.94 


i n oo 


2gbp 


1 f\ 1A 

— 10.36 


o nc 


n 

—9.2/ 


n hh 

—9. / / 


ZlID 


H A 1 

— /.41 


—9.5 1 


O CI 

— o.5j 


o^o 

O.JO 


zpnn 


A 1 O 

—0.3a 


A f\Cl 

— 4.uy 


A Ol 

— o.oj 


A net 
—6. /9 


zrl)4 


O A O 
— 0.48 


— 10.39 


— 1U.J 1 


1 n oa 
— 10. 2o 


2tsc 


1 1 AO 

— 1 1.02 


1 1 nc 
—1 1.05 


O AO 


o o o 
— a. 2a 


2ypi 


a c O 

6. jo 


C A f\ 

— J. 41) 


C TO 

—5. 11 


A /I C 

—6.45 


3ptb 


A /I A 

—6.46 


—4.93 


c no 

—5.02 


/i c c 
—4.55 


A A A. 

4cur 


— 13.23 


1 1 CO 

—1 1.52 


1 i m 
—13.93 


1 1 CO 

— 13.52 


5abp 


O nc 

— y.uj 


a a/i 

— 0.04 


/.iy 


H CO 
— /.59 


Averageerror 




1.88 


1.30 


1.22 


R 




0.73 


0.81 


0.81 


a 




0.0503 


0.0378 


0.0307 


P 




0.0125 


0.0082 


0.0118 


t*10000 






-2.4178 


-2.4312 


a2 






0.0093 


0.01 


P2 






-0.0011 


-0.00312 


X 








0.6 



The vdW potential is the LJ 8-4 type. The property x of Sx is the ASA. The energies are presented 
in kcal/mol, and R represents the correlation coefficient. 

As with the results described above, the best property x among the four properties (ASA, DIH, 
vdW, and ELE) was the ASA. The DIAS results obtained by Equation (19) were better than those 
obtained by Equation (18). The DIAS results in Table 4 were obtained by using Equation (19). The 
consideration of s e ff slightly improved the AG estimation. As a result, the correlation coefficient 
between the experimental and the calculated AG values was 0.81, and the average error of AG was 1.2 
kcal/mol. This result greatly improved the results obtained by Equation (10). Figure 1 shows the 
correlation between experimental and calculated AG values obtained by the DIAS method (Equation (16)). 

Figure 1. Cross-validation results obtained by the DIAV method. The experimental data 
(AGexpti) and the calculated value (AGdiav). 



-2 




•18 1 

11 -12 10 -8 -6 1 -2 

AG„ %Tt , (kcal/mol) 



We examined the time-dependency of the AG obtained by the DIAS method. After 1 nsec MD 
simulation for equilibration, the sampling runs of 0.5 nsec, 1 nsec, 1.5 nsec and 2 nsec were performed. 
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The AG values did not depend on the sampling-time length so much. Namely, the average error over 
the 34 target protein-ligand complexes were 1.43 kcal/mol, 1.22 kcal/mol, 1.23kcal/mol and 
1.23 kcal/mol for the 0.5 nsec, 1 nsec, 1.5 nsec and 2nsec sampling times, respectively. The initial 
structures of these simulations were the experimentally obtained protein-compound complex 
structures. Thus, the protein-compound interaction did not depend on the sampling-time length 
so much. 

The results showed that the current method worked well for various target proteins. This method 
could be extended as: 

AG = «X< E vdW d)> + ^< ^(0 > + IX x S x 

i i X=ASA,DIH,RMSD,vdW,ele V ' 

where a, 0:2, (3, P2, and t x are the parameters. This extension is one of the generalized forms of 
Equation (16). We examined the combination of two properties out of five. The averaged error was 
increased by the combination of two entropy terms. Thus, Equation (16) is simple and accurate 
compared to Equation (21). 

We applied the generalized Born surface area (GBSA) method [37-39] for the AG calculation to the 
same protein-compound set used in the current study. The average error and the correlation coefficient 
between the experimental and calculated AG values were 51.7 kcal/mol and 0.03 that showed very 
weak correlation, respectively. The GBSA method is good to reproduce the trend of AG values of 
many ligands for one target protein. In the current study, each target protein has only one or a few 
ligands. The error of the AG obtained by the GBSA method is large, thus, the GBSA method could not 
reproduce the trend of the AG value in the current study. In this examination, the DIAS/DIAV methods 
showed the better results than the GBSA method. 

2.9. Application to Docking-Pose Prediction 

To evaluate the present method, we applied the DIAS method to the protein-ligand docking pose 
prediction. Usually, only 20%-30% of the docking poses generated by the protein-ligand docking 
program are correct (RMSD < 2 A) in the cross-docking test, whereas 50%-70% of the docking poses 
generated by the protein-ligand docking program are correct (RMSD < 2 A) in the self-docking test [7]. 
Of course, the cross-docking test is necessary for practical evaluation of the protein-ligand docking. In 
this section we mimicked the cross-docking test. We selected the docking poses by both the DIAS 
method (Equation (16)) and the docking program (Sievgene/myPresto [7]), then compared the results. 
The docking score of Sievgene was determined as: 

Score = c rot ■ N rot +c AV - (E ASA + E vdw ) + c ele ■ E ele + c hyd ■ E hyd + c mtra _ vdw ■ E mt (2 2) 

where N rot , E A sa, E vc iw, E e i e , Eh yc i, and E intra -vdw represent the number of rotatable bonds of the ligand 
molecule, the hydrophobic energy due to the accessible surface area, the vdW energy, the 
protein-ligand Coulombic potential, the hydrogen bond energy, and the intramolecular vdW energy of 
the ligand for Sievgene [7]. Also, c rot , c A v, c e k, Ch y d, and c intra . v dw are the optimized coefficient for each 
energy term. For each atom type, the sum of E A sa and E vdw gives a grid potential, and both energy 
terms are always simultaneously calculated. Thus, these two terms share the same coefficient, cav- 
Sievgene utilizes the grid potential to calculate each energy term except for the intramolecular 
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interactions. In this study, a mesh size of 60 x 60 x 60 was adopted. 

In this test, we prepared three types of protein structures: (model 1) the intact protein structure 
prepared in Section 2, (model 2) the energy-minimized structure of apo protein in water, and (model 3) 
the final structure of 2-nsec MD simulation of apo protein in water. The Sievgene docking program 
generated five docking poses for each target protein of the three prepared structures (models 1-3). 
Then each protein-ligand complex structure was evaluated by the DIAS (Equation (16)) with the fixed 
parameter described in Table 5 in the same manner described in the previous section (the vdW function 
was the LJ 8-4 type function, and the property x of Sx was the ASA). The best score poses were 
selected by Sievgene based on docking score, and the best AG poses were selected by DIAS. The 
results are summarized in Table 5. 



Table 5. Docking accuracy. 



Initial structure (intact 


Top AG structure by the 


Top scoring structure 


Best among the 


PDB coordinates: model 1) 


DIAS method 


by Sievgene 


top 5 structures 


RMSD < 1 A 


29.4% 


35.3% 


47.1% 


RMSD < 2 A 


41.2% 


76.5% 


94.1% 


RMSD < 3 A 


47.1% 


94.1% 


94.1% 


Energy-minimized 


Top AG structure by the 


Top scoring structure 


Best among the 


structure (model 2) 


DIAS method 


by Sievgene 


top 5 structures 


RMSD < 1 A 


40.0% 


6.7% 


66.7% 


RMSD < 2 A 


73.3% 


46.7% 


93.3% 


RMSD < 3 A 


80.0% 


73.3% 


93.3% 


Structure after MD 


Top AG structure by the 


Top scoring structure 


Best among the 


simulation (model 3) 


DIAS method 


by Sievgene 


top 5 structures 


RMSD < 1 A 


20.0% 


0.0% 


0.0% 


RMSD < 2 A 


33.3% 


33.3% 


33.3% 


RMSD < 3 A 


53.3% 


46.7% 


66.7% 



The vdW potential is the LJ 8-4 type. The property x of Sx is the ASA. 



When the energy-minimized structures (model 2) were used, the results obtained by the DIAS 
method were much better than the Sievgene results. The DIAS method selected the correct poses at a 
rate of 73% (RMSD < 2 A). Even if the DIAS method selected the best docking poses among the five 
poses generated by Sievgene, 93% of the five generated poses satisfy the RMSD < 2 A. Thus, the 
DIAS method selected 78% (73% out of 93%) of the correct poses. This shows that the DIAS method 
is useful for practical pose prediction in drug design. 

When the initial structures (model 1) were used, the Sievgene results were better than the results 
obtained by the DIAS method. This is a trivial self-docking test, and the MD simulations for energy 
calculation should slightly change the ligand coordinates from the crystal structures by thermal 
fluctuation. When the final structures of the MD simulation (model 3) were used, only 33.3% of the 
docking poses were correct (RMSD < 2 A) by Sievgene and the DIAS method. Still, the results 
obtained by the DIAS method were better than those obtained by Sievgene. The shapes of the 
ligand-binding pockets should be changed from their suitable structures after the MD simulations. This 
model structure is not suitable for docking studies. 
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3. Data Preparation 

To determine the coefficients for the AG score, we performed a protein-ligand docking simulation 
based on the known complex structures registered in the Protein Data Bank. Here, 34 complexes 
accompanied by the experimental binding free-energy values were selected from the database that was 
used to determine the AG scores of the PRO LEADS [6]. The PDB identifiers and the names are 
summarized in Table 6. In the test dataset, the metalloproteins were removed from the present analysis. 
Metal atoms (Zn and Fe atoms) formed covalent bonds with O and S atoms of the ligands, and the 
classical force field that we applied could not represent the covalent bond. Thus, the present method 
cannot calculate AG values for metalloproteins with high precision. 

Table 6. List of the proteins used. 



PDB ID Protein 

labe L-ARABINOSE-BINDING PROTEIN 

labf L-ARABINOSE-BINDING PROTEIN 

lapu ACID PROTEINASE (PENICILLOPEPSIN) 

ldbb FAB' FRAGMENT 

ldbj FAB' FRAGMENT 

ldog GLUCOAMYLASE 

ldwb THROMBIN 

1 epo ENDOTHIA ASPARTIC PROTEINASE 

letr THROMBIN 

lets THROMBIN 

lett THROMBIN 

lhpv HIV-1 PROTEASE 

lhsl HISTIDINE-BINDING PROTEIN 

lhtf HIV-1 PROTEASE 

lhvr HIV-1 PROTEASE 

lnsd NEURAMINIDASE 

lpgp 6-PHOSPHOGLUCONATE DEHYDROGENASE 

lphg CYTOCHROME P450 

lppc TRYPSIN 

lpph TRYPSIN 

lrbp RETINOL-BINDING PROTEIN 

ltng TRYPSIN 

ltnh TRYPSIN 

1 ulb PURINE NUCLEOSIDE PHOSPHORYLASE 

2cgr IGG2B (KAPPA) FAB FRAGMENT 

2gbp D-GALACTOSE/D-GLUCOSE-BINDING PROTEIN 

2ifb INTESTINAL FATTY ACID BINDING 

2phh P-HYDROXYBENZOATE HYDROXYLASE 

2r04 RHINO VIRUS 14 (HRV 14) 

2tsc THYMIDYLATE SYNTHASE 

2ypi TRIOSE PHOSPHATE ISOMERASE 

3ptb TRYPSIN 

4dfr DIHYDROFOLATE REDUCTASE 

5abp L-ARABINOSE-BINDING PROTEIN 
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The structural ensembles generated from the PDB structure given by MD in explicit water were 
prepared as follows. All target proteins were prepared with ligands (protein-ligand complex structure). 
The force fields and the charges of the protein atoms originated from AMBER parm99 [35]. The 
atomic charge of each ligand was determined by the restricted electrostatic point charge (RESP) 
procedure using HF/6-31G* -level quantum chemical calculations [40]. We used Gaussian98 to 
perform the quantum chemical calculations [41]. The whole structure of each protein was embedded in 
a sphere of TIP3P [42] water (CAP water), including ion particles of 0.1% Na + and CF, in order to 
neutralize the total charge of the systems. The center of the sphere was set at the mass center of the 
protein. The shortest distance between the protein atom and the CAP sphere wall was set to 10 A. 
Before an MD calculation was performed for the entire system, an MD calculation for only the solvent 
parts (solvent water and counter ions) was performed with the protein, ligand, and metal ion 
coordinates fixed, so as to bring the solvent parts sufficiently close to an equilibrium state. The 
SHAKE method was used to constrain covalent bonds between heavy and hydrogen atoms in any 
molecule in the system [43]. MD simulations of the entire system were performed using 2.0 fsec time 
steps with the temperature set at 310 K; the fast multipole method [44] was used to calculate the 
Coulombic interaction. The cutoff distance of the van der Waals interaction was 12.0 A. The MD 
simulations were performed by using cosgene/myPresto [18]. After equilibration steps of 1,000 psec, 
the protein coordinates were sampled every 1 psec. Finally, we obtained 1,000 structures for each 
target protein in the 1,000 psec production run. The software program myPresto version 4 [45] was 
used for the simulation. 

4. Conclusions 

We have developed the direct interaction approximation (DIA) method and examined both the 
direct interaction approximation without solvent (DIAV) and with solvent (DIAS) methods. The results 
showed that the inclusion of the fluctuation of the ASA/dihedral angle terms drastically improved the 
accuracy of AG. The DIAV method (Equation (16)) was the final form for the simple and accurate 
estimation of AG. The effective dielectric constant should be calculated by Equations (19) and (20), 
and the vdW potential should be the LJ 8-4-type function. This equation included six parameters: a, (3, 
a2, [32, x, and x. The six optimized parameters could be applied to all of the target proteins. 

In the explicit water model, the DIA (DIAV and DIAS) methods required only the MD simulation 
of the protein-ligand complex. The DIA method with the LJ 8-4-type function improved the accuracy 
of the calculated AG value drastically: the correlation coefficient between the experimental and the 
calculated AG values was improved to 0.8 as obtained by the DIAV method, from 0.7 as obtained by 
the simplified COMBINE method without the entropy term (Equation (10)), and the average error of 
AG was improved to 1 .2 kcal/mol as obtained by the DIAS method, from 1 .9 kcal/mol as obtained by 
Equation (10). 
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